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Abstract 



We consider the classical question of predicting binary sequences and study the optimal algorithms 
for obtaining the best possible regret and payoff functions for this problem. The question turns out to 
be also equivalent to the problem of optimal trade-offs between the regrets of two experts in an "experts 
problem", studied before by [14]. While, say, a regret of Q(\/T) is known, we argue that it important 
to ask what is the provably optimal algorithm for this problem — both because it leads to natural 
algorithms, as well as because regret is in fact often comparable in magnitude to the final payoffs and 
hence is a non-negligible term. 

In the basic setting, the result essentially follows from a classical result of Cover from '65. Here 
instead, we focus on another standard setting, of time-discounted payoffs, where the final "stopping 
time" is not specified. We exhibit an explicit characterization of the optimal regret for this setting. 

To obtain our main result, we show that the optimal payoff functions have to satisfy the Hermite 
differential equation, and hence are given by the solutions to this equation. It turns out that character- 
ization of the payoff function is qualitatively different from the classical (non-discounted) setting, and, 
namely, there's essentially a unique optimal solution. 

1 Introduction 

Consider the following classical game of predicting a binary ±1 sequence. The player (predictor) sees a 
binary sequence {&t}t>i, one bit at a time, and attempts to predict the next bit bt from the past history 
6i, . . . bt-i- The payoff (score) of the algorithm is then the count of correct guesses minus the number of the 
wrong guesses, formally defined as follows, for some target time T > 0, and where ht is the prediction at 
time t: 



One can view this game as an idealized "stock prediction" problem as follows. Each day, the stock price 
goes up or down by precisely one dollar, and the player bets on this event. If the bet is right, the player wins 
one dollar, and otherwise she looses one dollar. Not surprisingly, in general, it is impossible to guarantee a 
positive payoff for all possible scenarios (sequences), even for randomized algorithms. However, one could 
hope to give some guarantees when the sequence has some additional property. 

The above sequence prediction problem is in fact precisely equivalent to the two experts problem (or 
multi-armed bandits problem), where one considers two experts, via a reduction: one side of the reduction 
follows simply by using two experts, one always predicts and another always predicts "-1". Then one 

measures the regret of an algorithm: how much worse one's algorithm does as opposed to the best of the 
two experts (in hindsight, after seeing the sequence), which is equal to | X]i<t<T ^^^^ henceforth will 

refer to X]i<t<T ^ "height" of the sequence (as in the height of a growth chart of a stock). Regret has 
been studied in a number of papers, including [T2 [HI [El [3 [1] . A classical result says that one can obtain 
a regret of Q{-JT) for a sequence of length T, via, say, the weighted majority algorithm of [22]. Note that 





l<t<T 
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the payoff per time step btbt is essentially equivalent to the well known absolute loss function \bt ~ bt\ (see 
for example [TU], chapter 

Obtaining a regret of Q{\/T) has since become the golden standard for many similar expert learning 
problem. But is this the best possible guarantee? While there is a lower bound of n(-\/T), it is natural 
to ask what is the optimal algorithm for minimizing the regret, departing from asymptotic notation. Note 
that the weighted majority algorithm may not be optimal, even if it obtains the "right order of magnitude" . 
More generally, one can ask what exactly are all possible payoff functions one can achieve as a function of 
the total payoffs of the two arms (height, in our case). 

In this paper, we undertake precisely this task, of studying the algorithms that obtain the optimal, 
minimal regret possible and characterize the possible payoff functions. Our results also lead to optimal regret 
trade-offs between two experts in the experts problem from the equivalence between the two problems. The 
latter problem has been previously studied by |14j . and later by |20| . to address, say, an investment scenario 
where there may be two experts one risk taking and another conservative and one may be willing to take 
different regrets with respect to these two experts. In particular, it is known that it is possible to get regret 
0{\/T\ogT) with respect to one expert and l/T^^^^ with respect to the other. 

There are several reasons to study such optimal algorithms and compute the exact trade-off curves. First 
of all, such an optimal algorithm may be viewed as more "natural", for example, because if an autonomous 
system has the same optimization criteria (of minimizing regret), it would arrive at such an "optimal" 
solution. Second, it is worthwhile to go beyond the asymptotics of a Q{VT) bound. Specifically, often the 
final value of a sequence is actually of the order of Vt, such as for a random sequence. Although, we do not 
expect to obtain a positive payoff for a random sequence, a large fraction of all sequences still have 0(\/T) 
value. In such a scenario, it is critical to obtain the best possible constant in front of the VT regret bound. 
When the value of the sequence is indeed around &{VT), an algorithm with a regret of Q{VT) achieves a 
constant factor approximation, and improving the leading constants leads to an approximation factor which 
is a better constant. For example, in several investment scenarios it is known that the payoffs of the experts 
(or stocks) in time T is barely more than 0{VT) ( see, for example, the Hurst coefficient measurements 
of financial markets in [SI US]). In such settings, the precise constant in regret term can translate into a 
difference between gain and loss. Indeed, we find that our algorithm can have a regret that is about 10% 
lower than that of the well known weighted majori ty alg orithm and, at several positions on the curve, our 



payoff is improved by as much as 0.3vT (see figure 1(a) I. We also obtain the exact trade-off curve between 
the regrets with respect to two arms (see figure [2]) . 

We note that, in the vanilla setting, when there is a time bound T, the solution already follows from the 
results of [H] (see also [Hllin])) who gave a characterization of all possible payoffs back in 1965. One can also 
obtain the optimal algorithm by computing a certain dynamic programming, similar to an approach from 
[21j . Yet, the resulting algorithm has a betting strategy and payoff function that are time- dependent as well 
as depend on the final stopping time T. These dependencies introduce issues and parameters that are hard 
to control in reality (often the predictor does not really know when the time "stops"). To understand the 
time-independent strategies, we are led to consider the another classic setting of time-discounted payoffs (see 

[muz])- 

Thus we focus our study on regret-optimal algorithms in the time-discounted setting, where payoff is 
discounted, and there is no apriori time bound. Formally, we define a p— discounted version of payoff at some 
moment of time T, for a discount factor p G (0, 1), as 

Af^ = Y,bT-tbT-fp' 

t>0 

The question then is to minimize the regret with respect to this quantity, as a function of (discounted) height. 
One can also see this scenario as capturing the situation where we care about a certain "attention" window 



-"^since when \bt\ = 1, \bt — bt\ = \bt\\bt — bt\ = |1 — btbt\ = 1 — btbt- Thus the absolute loss function is the negative of 
our payoff in one step plus a shift of 1. Also bt values from { — 1, 1} or {0, 1} are equivalent by a simple scaling and shifting 
transform. 
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of time (given by p). One of the consequences of our study is that, when the strategies are time-independent, 
the characterization of the optimal regret /algorithms becomes quite different. 



1.1 Statement of Results 

In general, we study the optimal regret curves. Namely, we measure the payoff and regret as a function of the 
"height" of the sequence (the sum of the bits of the sequence, as defined above; one can also take a discounted 
sum). Note that comparing against height amounts to comparing the performance of our algorithm against 
that of two static experts: one that always predicts +1 ("long the stock"), and another that always predicts 
-1 ("short the stock"). The former obtains a payoff equal to the height and the latter obtains a payoff equal 
to negative height. 

We use the notion of a payoff function — a real function /, which assigns algorithm's payoff f{x) for 
each height value x. In particular, for fixed algorithm and a height x, let frix) denote the minimum payoff 
over all sequences with height x at time T. For a certain function fx, we will say that fxix) is feasible if 
there is an algorithm with payoff at least frix) over all possible sequences {bt} such that h{{bt}) = x. In the 
discounted scenario, the notion of height becomes the discounted height: hi^({bt}t<T) = St>o^T-tP*- More 
importantly, for time-independent strategies (in the discounted setting) we will say that f\x) is feasible if 
the payoff is at least f{x) for (discounted) height x at all times (feasible in steady state). 

Our goal will be to optimize the regret, defined for a payoff function /, as follows: 



where x ranges over all possible (discounted) heights. 

Note that |x| is the maximum of the payoff of the two constant experts. In general, we allow bets bt to 
be bounded reals in the interval [—1, 1]. In such a case, it is sufficient to consider deterministic strategies 
only. One can also consider the version of the problem when there is no restriction on the range of values 
for bt. We will refer to this case as the sequence prediction problem with unbounded bets. This will be useful 
in deriving bounds for the standard case with bounded bets. 

For starters, we remind the result for the vanilla, non-discounted, fixed stopping time setting, which 
follows from 12], and is related to Rademacher complexity of the predictions of the two experts (see [9lll0)). 
The theorem below also extends to the discounted scenario, with fixed stopping time T. See Appendix |A.2| 
for discussion of this settings. 

Theorem 1.1. Consider the problem of prediction of binary sequence. The minimal possible regret is 



There is a prediction algorithm (betting strategy) achieving this optimal regret and has f{x) = \x\ — R. The 
actual corresponding betting strategy may be computing via dynamic programming. 

Furthermore, f is feasible iffYl f{x)p{^) — where p{x) is the probability of a random walk of length T 
to end at X (i.e., E [f{x)] = for x being the height of a random sequence). For bounded bet value, we have 
the additional constraint that f is 1-Lipschitz. 

Time-independent strategies. Our main result is for optimal regret curves in the setting of discounted 
and time-independent strategies. We characterize the set of all-time feasible /'s. For this, we define a certain 
"optimal" curve function, which will be central to our claims. For constants ci,C2, define the following 
function: 



where erfi(a::) — i-erf (ia;) is the imaginary error function. We also define -F'ci,c2 to be the function obtained 
by bounding the derivative of F to lie in [—1, 1]. That is F = F when < I and F' = sign(F') when 



R{f) = max\x\ - f{x), 




Fci,c2{x) ^ciix- erfi(a;) - e"" / +C2X, 



\F' 



|>1. 
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Theorem 1.2 (Main). Consider the problem of discounted prediction of binary sequence with the discount 
factor of p — 1 ~ 1/n (corresponding to a "window size" of n). A payoff function f is feasible in the steady 
state if there exist constants ci , C2 such that for all x G [— n, n] : 

f{x)<Vn-K.cA^/Vn)-0{l). 

Conversely, if there exists a function g such that f{x) — ^/n ■ gixj-^/n) for infinitely many n and g is 
piecewise analyti^ then g{x) < i^ci,c2(3^) for some constants Ci, C2. 

Hence, the minimum p-discounted regret is, for C = niinQ>i ^ " 



crfi( V In ck) 



mini?(/p) = C^/n + 0{l). 
fp 



We note that the above characterization follows from a "limit view" of the corresponding dynamic 
programming characterizing the payoff function, which leads to a differential equation formulation of the 
question. Such an approach has been previously undertaken by [5T] to show that many differential equations 
can be realized as two-person games, as is also the case in our scenario. 



In particular, to prove Theorem 1.2 we show that / needs to satisfy the inequality 



/(.) > + + (1) 
2p 

It turns out that, after the correct rescaling, and taking the process to the limit, we obtain a differential 
equation. Namely, let g(x) — f{^/nx)/y/n denote a normalized version of / where the axes are scaled down 
by a factor or ^yn (the standard deviation of the height). We will assume that g is (piece- wise) analytic]^ 
Then, as n approaches infinity, the above inequality implies the following differential inequality: 

g" - 2xg' + 2g <0 

If we replace the inequality by equality, we obtain the Hermite differential equation which has as its solution 
the aforementioned functions i^ci,c2- While our solutions are close to these differential equation solutions 
i^ci ,C2 J we also point out the curious fact that if we insist on the constraint ([T]) being an equality, then the 
only solution is f{x) = 0. Thus the relaxation into an inequality seems necessary to capture the feasible set 
of functions. 

The algorithm from the above theorem is explicitly given. In particular, it computes the current dis- 
counted height X, and then outputs the bet b{x) — /(p^+i)~/(p^~i) foj- the next time step, for / from Theorem 
|1.2[ Surprisingly, the characterization of the feasible payoff functions / is very different when the strategies 
are time-independent as opposed to the time- dependent case. In particular, in the time-independent case, 
there are only two degrees of freedom as compared to the time-dependent case when there were infinite (or 
ss n) degrees of freedom. 



See figure 1(a) for the plots of the resulting betting strategy as compared to the one resulting from the 



multiplicative weights update algorithm (which also happens to be a time- independent strategy). Also, see 



figure 1(b) for the resulting payoff function / (where the axes have been scaled down by y/n). After scaling 
X down by ^/n, we obtain that b{x) tends to F'{x) as 71 — >■ 00. 



Trading off regrets between tv^o experts. We also relate our problem to experts problem with two 
experts (or the multi-armed bandit problem in the full information model with two arms/experts). Here, in 
each round, each expert has a payoff in the range [0, 1] that is unknown to the algorithm. For two experts, 
let 61.4,62,* denote the payoffs of the two experts at time t. The algorithm pulls each arm (expert) with 
probability 61, t, 62,4 G [0, 1] respectively where bi^t + 62,4 = 1. The payoff of the algorithm in this setting is 
A'rp := J2t=i bi,tbi,t + 62, 462, t- The objective of the algorithm is to obtain low regret with respect to the two 
experts. We note that this was first studied in [14]. 

"^In fact, it suffices to assume that tfie first three derivatives of g exist instead of requiring it to be analytic. 
^In fact all we will need is that it is twice differentiable. 
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(a) Scaled graphs of the betting strategies for n = 
100 for the weighted majority betting strategy b{x) = 
tanh(a;) (blue ) an d the betting strategy resulting 
from Theorem |1.2[ which is equal to cerfi{x) capped 
at ±1 (red), (x axis has been scaled down by y'n-) 



(b) Scaled graphs for the payoff curves f{x), for n = 
100, for the weighted majority algorithm (blue) and 
the solution resulting from Theorem |1.2| (red), (x 
and y axes have been scaled down by \/n.) 



Figure 1: Graphs for the prediction of binary sequences problem. 



We achieve the optimal tradeoff between the regrets with respect to the two experts by reducing it to 
an instance of the sequence prediction problem. In particular, define the loss of a payoff function / as the 
negative of the minimum value of /. Then we show that the regret/loss trade-off for the sequence prediction 
problem is tightly connected to the trade-off of the regrets with respect to the two experts. Hence, we also 
derive the regret trade-off for the case of two experts. (See figure [2] for the trade-off curve for the regrets in 
the two experts problem.) 



Figure 2: Tradeoff between two regrets _Ri , R2 (scaled down by y'n) for the two experts problem (time-discounted 
case). 



Theorem 1.3. Consider the problem of trading off regrets R2 with respect to two experts. Regrets i?2 
are achievable in the time-discounted setting if and only if there exists an a > Q such that T{aRi/ ^Jn) -\- 
T{aR2/\/n) > a/^/n, where T{x) = erfi(-\/lna:) for a; > 1. 

Multiple scales. Finally, we investigate the possible payoff functions at multiple time scales p (window 
sizes) . Several earlier papers considered regrets at different time scales; see O [16, ,29, i20j . We consider 
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two different time scales, pi = 1 — 1/ni and p2 = 1 ^ l/"-2, although a similar result can be obtained for a 
larger number of time scales. 

We exhibit the necessary and sufficient condition for a feasible payoff function, as window size goes to 
infinity. In particular, suppose that ni = ain and n2 = where n tends to infinity. Let xi and X2 
be the time discounted heights for the two different time scales. We can ask if it is possible to get (time 
discounted) payoff functions fi{xi,X2) and f2{xi,X2) at time scales ni and 712 respectively. Again we apply 
the coordinate rescaling by -y/n for both xi, fi and X2, f2- 

Theorem 1.4. For n > 1, fix two windows ni = ain and n2 — a2n. As n goes to infinity, there is are 
payoff function fi{xi,X2) = \J^g\{x\l \fn,X2l \/n) for the discount rate pi = 1 — 1/ni and f2{xi,X2) = 
\/ng2{xi/ y/ri^X2/ \/n) for the discount rate P2 = 1 ^ l/'^2; as n goes to infinity, if and only if the following 
system of partial differential inequalities is satisfied: 

^ ( 9 , d \^ ^, , ( d I „2^ 9 \ „ „2 

-^1 - ~2 (^aST + asij 91 + [aixi 0^^+02X2 - 



E2 
El 



2 

E2 > 



1 fjL 

2 \dxi 



a 

9X2 



92 



(alxi- 



91 

92 



51 >0 

52 >0 



We do not seem to have explicit analytical solution for the above system of inequations, and so perhaps 
one would have to rely on numerical simulations to solve it. This part is deferred to Appendix [C] due to 
space limitation. 



1.2 Related Work 

There is a large body on work on regret style analysis for prediction. Numerous works including |12| |9] have 
examined the optimal amount of regret achievable with respect to multiple experts. Many of the results in 
this body of work can be found in [101. It is well known that in the case of static experts, the optimal 
regret is exactly equal to the Rademacher complexity of the predictions of the experts (chapter 8 in |10|). 
Recent works, including [Tl[2[2Sj: have extended this analysis to other settings. Measures other than the 
standard regret measure have been studied in [25]. Also related is the NormalHedge algorithm TT], though 
it differs in both the setting and the precise algorithm. Namely, NormalHedge looks at undiscountcd payoffs 
and obtains strong regret guarantees to the epsilon-quantile of best experts. We look at two experts case 
(where epsilon-quantile is not applicable) and seek to obtain provably optimal regret. 

Algorithms with performance guarantees within each interval have been studied in 1161 129] and, more 
recently, in [18l [2^ . The question of what can be achieved if one would like to have a significantly better 
guarantee with respect to a fixed arm or a distribution on arms was asked before in [HI |20]. Tradeoffs 
between regret and loss were also examined in [^5] , where the author studied the set of values of a, h for 
which an algorithm can have payoff aOPT + fologA^, where OPT is the payoff of the best arm and a, 6 
are constants. The problem of bit prediction was also considered in [151, where several loss functions are 
considered. Numerous papers (O 1191 [3]) have implemented algorithms inspired from regret syle analysis 
and applied it on financial and other types of data. 



2 Time-Independent Prediction Algorithms 

In this section we study the optimal regret and algorithms for the time-independent strategies and regret 
curves. We consider the time-discounted setting, thereby proving Theorem |1. 2 1 

As mentioned in the introduction, we consider a payoff / to be feasible if there is a prediction algorithm 
that achieves a payoff of at least f{x) for the discounted height x at all times t > 1. We will argue that, 
without loss of generality, we can assume that the betting strategy h{x) is time independent and the payoff 
always dominates the function f{x). 

Observe that for a time independent betting function, the payoff function it achieves is also time inde- 
pendent in the limit. 
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Claim 2.1. /// is feasible (in steady state), then there is a time-independent betting strategy b that achieves 
payoff function f. 



Proof of Claim \2.1\ Remember that we use discount factor of p = 1 — 1/n, where n > 1 is the "window" 
size. Assume there is a tune- dependent betting strategy bt{x) that achieves payoff at least f{x) in the steady 
state. We consider the average of these betting strategies over a long interval and argue that it changes only 
slightly over time. Note that the time shifted strategy bt-i also achieves payoff at least f{x) at all times. 
This means that an average of a lar^e number of such shifted betting strategies also achieves this. Consider 
the average strategy iJ.t{x) — bt-i{x), and note it is essentially constant over a small window for a 

sufficienly large TV. For example, if we choose N > exp(n) the differences in fit over a window of size poly(ri) 
are exponentially small. Since we are time discounting at rate 1 — 1/n, it suffices to ignore anything outside 
such a window of size poly(n). □ 

We will characterize the payoff functions that are feasible for time-independent betting strategies. 
Lemma 2.2. // there is a time-independent betting strategy with payoff function f{x) then 

fix) > + + (2) 
2p 

Conversely if f satisfies the above inequality and /(O) < 0, then it is feasible with unbounded bets. In 
particular the betting strategy b{x) = /(p^+i)~/(p^~i) achieves a payoff function at least f. For the bounded 
bets case, we need the additional constraint that b{x) computed thus satisfies \b{x)\ < 1 

Proof. Note that since the payoff at time t is b{xt)h (where b = bt), we have pf{x) + bb{x) > f{px + b) where 
b e {±1}. This is because at time t — 1 there is some sequence of height x with payoff f{x). Thus, we have 
pf{x) + b{x) > f{px + 1) and, similarly, pf{x) — b{x) > f{px — 1). Averaging the two we get inequality ([2|. 

To prove the converse we can use induction on time t to show that the stated betting strategy achieves 
payoff at least f{x). Clearly at t = 0, x = and since /(O) < the condition is satisfied. Further, if the 
height is x at time t~ 1 then at the next step the payoff is at least pf{x) + bb{x) > f{px + b) for b G { — 1, 1} 
which follows from the inequality □ 



We now proceed to proving the main claims of Theorem 1.2 In particular, we start by showing the 
"converse" direction. For this, we will show that, in the limit, the payoff function has to satisfy a certain 
differential equation, when property scaled. The next lemma proves precisely this switch. 

Lemma 2.3. Let g(x) = f{^/nx)/^/n, and assume it is piece-wise analytic. Then as n — > oo, condition ^ 
becomes 

g" - 2xg' + 2g<0. (3) 
Proof. Rescaling and setting S = l/V^ (i.e., p = 1 — 5^) in inequality ^ gives us: 

(1 - S^)gix) > 9iil-S^)^ + S)+giil-S^)x-S)_ 

Using Taylor expansion on g, we obtain 

(1 - S^)g{x) > g{x) - S^xg'ix) + {1/2)S^{1 + S^x^)g"{x) + 0{S^)g'"{x - S^x ± S) 
> gix) - xg'ix) + (1/2)(1 + S^x^)g"ix) + 0{S)g"'{x - S^x ± S) 

As S = l/^/?i 0, we obtain that g" - 2xg' + 2g < 0. □ 

We note that if we replace the inequality ^ with the equality, we obtain the Hermite differential equation: 

g" - 2xg' + 2g = 0. (4) 

Differential equation Q has a general solution of the form Fc^^ca = ci(xerfi(x) — e^^ /^/n) + C2X, where 
erfi(a;) = i ■ crf{ix) is the imaginary error function. 
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Remark 2.4. Note that, for example, this "limiting payoff function" i^ci,c2 satisfies the "limiting" T — >■ cx) 



characterization similar to Theorem A. 3 Namely, for any constants Ci, C2, we have that J pp{x)Fc-^^c2ix) dx 
0, where Pp{x) is the distribution of the p-decayed random walk to be at height y/nx, at the limit ofn,T — > 00. 
(Note that pp converges to N{0, 1) when n — )■ 00.) 

In the following, we show that the solutions for the steady-state payoffs are essentially characterized 
by functions fci,c2- Note that we thus obtain solutions that have only two degrees of freedom. This is in 
stark contrast to the time-dependent strategies, where there is an infinite number of degrees of freedom (see 
Appendix |A| . 

The next lemma shows that if g(x) — f{\/nx)/y'n satisfies the differential inequality, then g must be 
dominated by fci,c2j i-^-, a solution to the Hermite differential equation. 

Lemma 2.5. Suppose g satisfies g" — Ixg' -I- 25 < 0. Then there exist some c\,C2, such that g < i^ci,c2- 

Proof. There is a unique solution y = Fc^^c2 such that y{0) = g{0) and y'{0) = ^'(0). Now look at h = g — y. 
We will show that h<0. Observe that /i' satisfies h" - 2xh' + 2h<Q and /i(0) = /i'(0) = 0. 

We will make the substitution u = xh' — h. Hence we have that u' = xh" , and thus u' /x — 2u < 0. 
or u' < 2ux for a: > and u' > 2ux for x < and w(0) = 0. This implies that u < 0. This means that 
xh' ~ h < which implies h < 0. □ 

So far we have ignored the condition that the |6| < 1 thus allowing unbounded bets. In the following claim, 
we consider the case of bounded bets and show that in this case the function g has a bounded derivative. 

Claim 2.6. With bounded bets \b\ < 1, the function g must also satisfy the constraint \g'(x)\ < 1 as n ^ 00. 

Proof. For 6 — we have that 

(1 - S-)g{x) + S > g{il ~ 6^)x + 6)=^ -5g{x) + 1 > aiil - S^)- + S) ^ gix) 



Considering (5^-0 gives g'{x) < 1. Similarly we get —g'{x) < 1. □ 

Suppose we choose a solution g = i^ci,c2 1 this would correspond to the betting strategy b{x) = c\ •erfi(a;) -I- 
C2- Note that F doesn 't sati sfy < 1, but a simple capping of its growth when > 1 gives a alternate 



function F (see figure 1(b) I that satisfies the extra condition. This essentially corresponds to capping 6(x) 
so that \b(x)\ < 1. Let b{x) denote the capped version of b{x) that can be used for bounded bets. 

This concludes the "converse" part of Theorem |1.2| Next, we switch to showing the forward direction, 
that if (a properly scaled) / is dominated by F, then it is also a valid payoff function. In particular, in the 
next lemma, we show that the solutions to the differential inequality can be made to satisfy the original 
recursive inequality ([2]) with a small error term. 

Lemma 2.7. For any constants Ci, C2, for the hounded bets case, there is a function g{x) = F{x) — 0{1/ ^/n) 
such that ^/n ■ g{^/nx) satisfies the inequality 

With unbounded bets, there is a function g{x) ~ F{x) ■ e~^^^ /n+i/n) g^^/j ^^^^ y'rig(y'Tix) satisfies the 
inequality ([2|. 

Proof. Let S = Xj^/n. We will argue that the 0{S) slack is sufficient to account for the error in the Taylor 
approximation in the bounded bets case. To see this note that the error in the Taylor approximation is 
S^x^g"{x) + 0{6)g"'{{l — S^)x ± S), where g"'{x ± e) denotes the average of g'" at two points in the range 
X zt e. We will look at the interval where |F'(x)| < 1. For constant ci, C2 the end points of this interval are 
also constants which implies all the terms in the error expression are constants (since / is independent of 
S). Thus the error is at most 0{S). So it suffices to satisfy the condition g" — 2xg' + 2g < ~0{S) which is 
satisfied by F — 0{S). For the region where |F'(a;)| > 1, note that in F we are capping |i^'(a;)| = 1 and since 
g is F shifted down, it also satisfies the inequality. 
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For the case of unbounded bets, observe that the recursive inequahty holds if we satisfy the following, 
per the approximation of the Taylor series: 

g"{x - x^S"^ ± (5)(1 + - 2xg'{x) + 2g{x) < 0. 

For simplicity of explanation, consider x>0. Note that 5x <1. We have F"{x) — cic^^ . Suppose we look 
for a function g that satisfies: g"ix) is even and is increasing in x when x > and g"{x±e) < g"{x) /e'*'^ +^ ^ 
for a big enough constant in the O — we will later verify that our resulting g indeed satisfies this (note that 
this is satisfied for F). Then, since (1 + x'^5'^) < e*^*-^ ^ \ the above inequality is satisfied as along as 

g"{x)<e-0^^'''+''h{xg'{x)-g{x)) 

Again, for u = xg' - g, we get: ^ < e~°("=''''+'^')2u which holds if u < e^'-o(<5"+52)^ 
So it suffices that xg' - g = e'^-0{x's''+s'') ^ Dividing by x^, we get: {g/x)' = e^'-o(a:25^+5^)y^2^ 
Note that without the correction terms the earlier differential equation {g/x)' — jx'^ has the solution 
g — F, and so for the new equation there is a solution g ~ Fe^^'^^ s +s ) ^ Note that this g also satisfies 
g"{x ±e)< g"{x)/e'^^'=' that we had assumed. □ 



Our Theorem |1.2| is hence concluded by Lemma |2.2| and Lemma |2.7| In the following we remark that 
obtaining a (non-trivial) solution that preserves the equation Q precisely is impossible. 

Remark 2.8. If we convert the condition ^ into an equality then the only satisfying analytic solution f 
is f{x) = cx for a constant c. Thus the relaxation into an inequality seems to be necessary to find all the 
feasible payoff functions 

Proof. If we require the equality f{x) — i(p^+i)+.^(p^~J-) then applying this recursively i times gives: 



fix) 



2p 

' 2* 



We apply Taylor series to f{p^x -I- -I- • • • -I- hi) around the point y = p' -I- • • • -I- 6^ to conclude 

that f{y + p^x) — f{y) + p^xf'{y) + p'^^x'^f"{y ± p^x). We now consider the following difference 

, ,,,, ^Xbu-b.e{-i.i}fiP'x + P"'b^ + --- + b,)-fip^-^b^ + --- + b,) 
f{x) - /(O) = p ■ 

and using the above expansion, we have 

fix) - /(o) = ^bu-b.€{-i,i},y=p^~n,+-+b, xfjy) + p'x^f'jy ± p'x) 

2* 

Taking i tend to infinity, we conclude that f{x) — /(O) = x ■ J Pp{y)f{y) dy. This implies that / is of the 
form f{x) = cx + a. Moreover substituting / into the equality condition, we obtain that a — 0. □ 
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A Prediction Algorithms for Fixed Stopping Time 

In this section we discuss the optimal regret and the corresponding betting algorithm for a fixed stopping 
time T, which leads to strategies that depend on current time t and the stopping time T. We consider the 
classical non-discounted setting (Theorem |1.1[ ) and the time-discounted setting (Theorem A. 3), both with 
fixed stopping time. 

In the non-discounted setting, we show that the optimal regret and algorithm follow easily from the 
existing work of [12] . 

We note that the resulting prediction algorithms depend on the current time t and the stopping time 
T. We will consider the admittedly more interesting case — of time-independent strategies — in the next 
section. 



A.l Non-discounted setting 

[T^ gave a precise characterization of possible payoff curves attainable. First of all, he showed that, if, for a 
sequence b e {±1}-^, we denote g(b) to be the payoff/score obtained for sequence b, then J^tdi^) ~ ^ 
possible algorithms. Cover proves the following characterization of the curve as a function of the height of 
the sequence: 

Theorem A.l f|12|). Let / : N — )• M be the payoff function of an algorithm, where f{x) is the payoff of an 
algorithm for sequences of height x precisely. Then f is feasible if and only if: 1) X^^^o ~ ^ ^ 

and 2) \ f{x + 1) - f{x)\ < I (f is Lipschitz). 

From the above theorem we have the following corollary. 



Corollary A. 2. f{x) ^ \x\ — R is feasible for R = y -^n/T + 0(1), and this is the minimum R for which 
this is feasible. 

x\ - R, where R = £^^{±1^ [| J2t h\] ■ 



Proof. Note that Theorem A.l holds for the payoff function f{x) 



%e{±i}- [lE.^.I] \Vt^ 



< 



To compute this value R, we use following standard approximation: 
0(1), where <j>{x) is the normal distribution (see, e.g., 03], Theorem 3.4). Furthermore, we have that 



Tx 



T. The corollary follows. □ 



To recover the actual prediction algorithm, we employ the following standard dynamic programming. 
Namely, define st(x) to be the minimal necessary algorithm payoff, after i*'* time step for height x, in order to 
obtain payoffs of ST{y) = f{y) = \y\ — R. In particular, if bt{x) denotes the prediction (bet) at time t assuming 
the current height is a;, we have that St{x) = min|f,j(a;)|<i max{st+i(a; + 1) — 6t+i(a;), St+i{x — 1) -I- bt+i{x)}. 
Suppose we ignore the boundedness of bt{x), then the minimum is achieved for bt+i{x) — ^{st+i{x + 1) — 
St+i{x — 1)). Note that this way we obtain so(0) = Ebg{±i}T [/(X^i ^0] = (which gives a different proof of 
the above theorem). But these values of b actually satisfy |&t(a:)| < 1, since if the Lipschitz condition holds 
at time t, then it also holds at time t — 1. Hence there was no loss of generality of dropping the boundedness 
of 6t(x)'s. In particular, we have that bt{x) = ^^-t+i J2be{±iV~*+'^ (^\x + 1 + J2j bj \ - \x ~ 1 + J2j ^j l) ■ 

This concludes the proof of Theorem |1.1| to show the optimal regret and prediction algorithm for the 
vanilla fixed stopping time setting. Note that the prediction algorithm bt{x) depends on the current time: 
for example, for t close to T the all bet values are close to 1, whereas for small t's we obtain very small 
values of bt{x). 
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A. 2 Time-discounted setting 



We prove the foUowing theorem for the time-discounted setting with fixed stopping time T, by extending 
the characterization given in Section |A.1[ 

Theorem A. 3. Consider the problem of time- discounted prediction of binary sequence for "window size" n. 
Fix the discount factor p = 1 — 1/n. For any fixed time T , fx is feasible iff J f{x)pT{x) dx = where Pxi^) 
is the probability of a (decayed) random walk to end at height x and f is 1-Lipshitz (for bounded bet value). 
There is an algorithm (betting strategy) achieving this optimal regret and has f{x) — \x 



Rp, where 



0{1), 



kzPl 



Note that a — >■ T when T <^ n, and a — > n/2 when 



Rp = min/ i?T(/) - y ^v- ■ -v-y. — - - i_p 
T ^ n. The betting strategy may be computing via dynamic programming. 

First, we need to count the number of random walks achieving a certain discounted height x. When the 
height was not discounted, this was simply a binomial distribution, which we approximated by a normal 
distribution. It turns out that, in the discounted height case, the height distribution is also approaches 
normal distribution at the limit. Specifically, we show the following lemma. 

Lemma A. 4. Consider the time- discounted setting, with discount p — 1 — 1/n for some n>l. Let pt{x) 
be the probability that a random binary sequence of length T has discounted height x S [— n, n]. Then, as 
T goes to infinity, the probability distribution of the discounted height, scaled down by ^/a, converges to the 

normal distribution N(0, 1), where a — — - — 



Furthe 



E 



be{±i}T 



J2^>l hp 



■±o{i). 



Proof. Note that the height is distributed as a; = * where bi are random ±1. Then, by Lyapunov 

central limit theorem, we have that X^iLi biP^~^ tends to A^(0, 1) as long as T = cli„(1). 



Again, we have that (see, e.g., [23], Theorem 3.4) 



E 



T-i\ 



E 



/a ■ x\ 



I]i=o P = Hence, we obtain that E^^^^^t \ J2i>i hP 



□ 



The rest of the proof of Theorem |A.3| follows along the same lines of Theorem |1.1| Specifically, one 
can employ the same dynamic programming (for all possible discounted heights). We again have that 
so(0) — Ea;^py Ifi^)] for any desired target function /. The only way so(0) = is when E^^p^ ifi^)] = 0. 
As long as / is also Lipschitz, the dynamic programming will recover the betting strategy with bounded bets 
|^t(a^)| < 1- As in the previous setting, note that the betting strategy bt depends on the time t: it is small 
at the beginning, and gets closer to 1 for large values of t (close to T). 



B Trade-off with two experts 

In this section we will prove Theorem |1.3| by proving an equivalence between the sequence prediction problem 
and the two-experts problem. In each round of the experts problem, each expert has a payoff in the range 
[0, 1] that is unknown to the algorithm. For two experts, let bi t, &2,t denote the payoffs of the two experts. 
The algorithm pulls the each arm (expert) with probability 6i.t,62,t S [0, 1] respectively where 61,4-1-62,* = 1- 
The payoff of the algorithm is A = X]t=i ^i,t^i,t + 62,(62.*. Let Xi = X]t=i ^i * We will study the regret 
trade-off -R2 with respect to these two experts which means that A > Xi — Ri and A > X2 — R2- 

For this we we translate it into an instance of the sequence prediction problem where we show how we 
can obtain a tradeoff between regret R and loss L, which is defined as the minumum payoff of the algorithm. 
With two experts, the regret/loss tradeoff in the sequence prediction problem is related to regret trade-off 
for the two experts problem. Let R, L be feasible upper bounds on the regret and loss in the sequence 
prediction problem in the worst case; Let Ro, Lq be feasible upper bounds on the regret and loss with version 
of the sequence prediction problem with one sided bets (that is bt cannot be negative; the feasible payoff 
curves for this case is a simple variant of F^^ ^^ where F' is capped to lie in [0, 1].) Let R2 be feasible 
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upper bounds in regret with respect to expert one and expert two in the worst case. Another variant that 
has been asked before is a tradeoff between regret to the average and regret to the max (see [21 HO]). Let 
Rm J Ra be feasible upper bounds on the regret to the max and regret to the average with two experts in the 
worst case. 

Theorem 1 1 . 3| follows from the following two lemmas. 



Lemma B.l. Regret and loss R, L is feasible in the sequence prediction problem if and only if Rm = 
R/2,Ra = L/2 is feasible for regret to the max and regret to the average in the two experts problem. 

Ro, Lg is feasible in the sequence prediction problem (with one sided bets) if and only if Ri = L^, R2 — Ro 
is feasible for regret to the first expert and regret to the second expert in the two experts setting. 

For X > 0, let T{x) = h{g-^{x)) where g{x) = e"^',/i(x) = erfi(a;). Note that T{x) = erfi(%/h^). 

Proof of Lemma \B.1\ First we look at reduction from the regret to the average and regret to the max 
problem. We can reduce this problem to our sequence prediction problem by producing at time bt — 
{bi t — ^2.t)/2- A bet bt in our sequence prediction problem can be translated back into probabilities bi t = 
(1 + bt)/2 and (1 — 6t)/2 for the two experts. A payoff A in the original problem gets translated into payoff 
+ bt)/2 + 62,4(1 — bt)/2 — {Xi + X2)/2 + A in the two experts case. In this reduction the loss 
L gets mapped to Ra and the regret R gets mapped to However note that bt is now in the range 

[0,1/2]. Therefore we need to scale it by 2 to reduce it to the standard version of the original problem. 
Conversely, given an sequence bt of the prediction problem we can convert it into two experts with payoffs 
61, t = (1 + ^t)/2, 62,* = (1 ^ ^t)/2- The average expert has payoff T/2. A payoff of A in prediction problem 
can be obtained from a sequence of arm pulling probabilities with payoff T/2 + A/2 by interpreting the arm 

pulling probabilities as (1 ± &t)/2 since Y.t ii±^ii±^ + ilzM = t/2 + A/2. 

Next we look at regrets i?i,i?2 with respect to the two experts. Given a sequence of payoffs to for the 
two experts we can reduce it to a sequence for the (one sided ) prediction problem by setting bt = 62,4 — bi^f 
A bet bt in the prediction problem can be translated to probabilities bi^t = ^ — bt and 62.* = bt for the two 
experts. A payoff A in the prediction problem gets translated into payoff X]t(l ~ bt)bi,t + btb2,t ^ Xi + A 
in the two experts case where a zero regret in the prediction would correspond io A — X2 — Xi. Thus a 
loss of Lo translates to a regret i?i = with respect to the first arm. And regret R^ translates to regret 
R2 ~ Ro with respect to the second arm. Thus if i?o, To is feasible then so is Ri — Ro,R2 = Lo- Conversely, 
given an instance of the prediction problem with one sided bets, we can convert it to a version of the two 
armed problem by setting 62,* = 6t,&i,t = if &t > and &2,t = 0, 6i,t = —bt otherwise. A bet bt is used 
in our original problem if the arms are pulled with probabilities 1 — bt and bt respectively. The payoff in 
the experts problem is Xi + J^t ^t(^2,t — ^i,t)- regrets R2 will translate to Lq — Ri, Rq = R2 in the 
prediction problem with one sided bets. 

The above reduction also works for the time-discounted case. □ 

Lemma B.2. Let R, L, Rq, Lq be normalized by a factor ^/n (scaled down). R, L is feasible in the original 
problem if and only ifT{R/L) = 1/{^/ttL). 

Ro,Lo is feasible in the original problem (with one sided bets) if and only if there is an a > so that 
T{aLo)+T{aRo) > a/^. 



Proof of Lemma B.2 The best tradeoffs for i?, L is attained when F is symmetric; that is, F = ci(a;erfi(a;) — 
/-^/tt) with the slope capped in the interval [1,-1]. Here L = ci/^/t: corresponds to the minimum value 
attained at x = 0. i? is obtained by looking at a; — T at the point xo where F' — 1 giving ci erfi(a;o) = 1 
implying R^x- F ^ cie''°/y/^. Thus l/(v^T) = erfi(xo) and R/L = e"^', implying T{R/L) = 1/{L^). 

In the case of one sided bets, we look at the curve F — ci(a;erfi(2;) — /v^) + C2X where additionally 
the derivative is capped in the interval [0, 1]. Loss Lo is maximized at the minimum point xi where F' = 
giving ci erfi(a;i) + C2 = implying Lg — —F{xi) = Cie^ /v^- Regret Ro is maximized at Xg where F' = 1 
(which means cierfi(a:i) + C2 = 1) giving Ro = x — F ~ cie^^/^/n. Since is even and erfi(x) is odd, 
T{Lo\/n/ci) — \c2/ci\ and T{Ro^/tt /ci) = \{1 — C2) / ci\. For a given ci > (as otherwise regret is infinity), 
a C2 exists if and only if T[Lo\pn /ci) + T{Ro\/ti / ci) > \/c\. □ 
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C Multi-scale Optimal Regret 



We now show how the framework can be extended to the multiple time scales. The sequence bt may have 
trends at some unknown time scale and therefore it is important that the algorithm has small regret not just 
at one time scale but simultaneously at many timescales. We will now prove that (with unbounded bets) 
there are (normalized) payoff functions 5i(xi,X2) and 32(2^1; 2:2) at time scales an and bn if and only if it 
satisfies the conditions in Theorem 11.41 

Proof of Theorem \1.4\ If b{xi, X2) is the betting function, then as before we get pifi{xi, X2) + bb{xi, X2) > 
fiipixi + b,p2X2 + b) for 6 G {-1, 1} and p2fi{xi,X2) + bb{xi,X2) > f2{pixi + b, P2X2 + b) for b e {-1, 1} 
Further these conditions are sufhcient. Simplifying we get 

Plfl{.Xi,X2) + b{xi,X2) > flipiXi + 1,P2X2 + 1) 
Plfl{xi,X2) - b{xi,X2) > flipiXi - l,P2X2 " 1) 

This is satisfied if and only if 

Plfl{xi,X2) - (l/2)(/i(piXi + 1,P2X2 + 1) + flipiXi - l,P2X2 " 1)) > 

|(l/2)(/i(pia;i + 1,P2X2 + 1) - fiipiXi - l,P2X2 - 1)) - b{xi,X2)\ 

To see this, note that if b{xi,X2) = (l/2)(/i(pia;i + l,p2X2 + 1) — fi{pixi — 1,P2X2 — 1)) then the two 
inequalities become identical. Otherwise we can denote the difference by A and we get that the left hand 
side has to be > ±A. 

Similarly we get ^2/2(2:1, 0:2) - (l/2)(/i(piXi + 1,^22^2 + 1) +/i(/Oia;i - 1,^22:2 - 1)) > \il/2){f2{pixi + 

1, P22;2 + 1) - f2{PlXl - 1, P22;2 " 1)) 7 6(2:1, ^2)! 

We can write these as Li > \Ri — b\ and L2 > |i?2 — b\. 

Note that for such a 6 to exist it is necessary and sufficient that Li + L2 > |i?i — i?2| and Li > and 
L2 > 0. 

Now rescaling into functions gi and 52 we get 

= Plfl{xi,X2) - {l/2){fi{piXi + \,P2X2 + 1) + f\{p\Xx - \,P2X2 - 1)) 

= (1 - a\S^)gr{x^,X2) - i(ffi((l - a?<52)xi + <5, (1 - 9S')x2 + 5) + .9i((l - aH-')x^ - <5, (1 - bH-')x2 - S)) 
= -ajS^g,{x,,X2) + SHaj^^ + al^^)g^ + (l/2)((-afj2 + + + 

+(l/2)((-a?<52 - 6)ir^ + ii-alS' ~ 6)^^^) 

Dividing by (5^ and taking limit as (5 ^- we get -algi{xi,X2) + (alxi^ + alx2^)gi - (l/2)(g|^ + 

Thus we have Ei = -algi{xi,X2) + (a?2;ig|- + alx2^^)gi - (l/2)(g|- + gf^)^gi > and E2 = 

-alg2{xuX2) + {alx^it^ + «i^2gf^).g2 - (l/2)(gf^ + gf^)'<?2 > 0. 

Now i?i — {l/2){fi{pixi + l,p2X2 + 1) — flipixi — i,P2X2 — 1)) After scaling this becomes in the limit. 
= (l/2)(5i((l - alS^)x^ + S,{1- b^6^)x2 + S) - g,{{l - aH^)x, - <5, (1 - bH^)x2 - 5)) = S^i£^ + £-^)g,. 

Dividing by 52 we get: Ei + E2 > + gf^)(<?i - 32)!. □ 
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